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Abstract 

A population genetics formulation of Eigen's molecular quasis- 
pecies model [Naturwissenchaften 58, 465 (1971)] is proposed and sev- 
eral simple replication landscapes are investigated analytically. Our 
results show a remarkable similarity to those obtained with the original 
kinetics formulation of the quasispecies model. However, due to the 
simplicity of our approach, the space of the parameters that define the 
model can be thoroughly explored. In particular, for the single-sharp- 
peak landscape our analysis yields some interesting predictions such 
as the existence of a maximum peak height and a minimum molecule 
length for the onset of the error threshold transition. 

1 Introduction 

The molecular quasispecies model introduced by Manfred Eigen more than 
twenty years ago JT] has become a major framework of the research on the 
dynamics of competing self- reproducing macromolecules (see || for a review). 
In this model, a molecule is represented by a string of v digits (si, S2, . . . , s u ), 
with the variables Sj allowed to take on k different values, each of which 
representing a different type of monomer used to build the molecule. The 
number of different types of molecules is thus k v . The concentrations X{ of 
molecules of type % = 1, 2, . . . , k v evolve in time according to the following 
differential equations 

^ = E W vXj - [A + $ (t)) x t , (1) 
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where the constants D{ stand for the death probability of molecules of type i 
and $(t) is a dilution flux that keeps the total concentration constant. The 
elements of the replication matrix Wij depend on the replication rate Aj 
of molecules of type % as well as on the Hamming distance d(i,j) between 
strings % and j. They are given by 

W ii = A i q» (2) 

and 

Wv = {k _^ d{iJ) 1 V - d(i ' j) (1 ~ if*'* (3) 

where < q < 1 is the single-digit replication accuracy, which is assumed 
to be the same for all digits. Perhaps the main outcome of the quasispecies 
model is that, for a given replication accuracy, there exists a maximum string 
length which selection can maintain. This phenomenon, termed the error 
threshold, poses a serious difficulty in envisioning life as an emergent property 
of systems of competing self-replicating macromolecules. It seems that some 
sort of cooperation between the macromolecules must be incorporated in the 
model in order to avoid this error catastrophe |3|, [|] . 

In this paper we employ a classic population genetics approach || to in- 
vestigate the evolution of an infinite population of self-replicating molecules. 
To accomplish that we have to make two simplifying assumptions to the orig- 
inal quasispecies model. First, we assume that molecules composed of the 
same number of monomers of each type are equivalent, i.e., possess the same 
replication rate, regardless of the particular positions of the monomers inside 
the molecules. Hence, a given molecule is characterized solely by the vector 
P = (Pi, P2, . . . , P K ) where P a is the number of monomers of type a in that 
molecule. Since J2a Pa = v, the number of different types of molecules is 
reduced to (y + k — 1)!/V! (k — 1)!. Second, in the population genetics ap- 
proach we focus on the evolution of the monomer frequencies, rather than on 
the evolution of the molecule frequencies or concentrations. Henceforth the 
variable t will denote the number of nonoverlapping generations or simply the 
generation number. We assume then that, given the monomer frequencies in 
generation t, p a (t) with J2aPa(t) — 1 ; the molecule frequencies are given by 
the multinomial distribution 

U t (P) = C» P [ Pl (t)} p i [p 2 (t)] p > . . . \p K {t)] PK (4) 
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where Cp = v\jP^.P^. . . . P K \. Thus, in each generation the monomers are 
sampled with replacement from an infinite monomer-pool. The effects of 
random drift are neglectable because the population of molecules is also infi- 
nite. The changes in the monomer frequencies are due then to the driving of 
natural selection, modelled by the replication rate A(P), and to mutations, 
modelled by the error rate per digit 1 — q. A similar assumption was em- 
ployed in the analytical study of the effects of learning on evolution || . With 
these assumptions we are able to study analytically the dynamical behavior 
of the model in the full space of the control parameters u, q, k, and repli- 
cation landscapes A(P). In particular, while previous investigations |2| have 
almost exclusively dealt with binary strings (k = 2), our population genetics 
approach readily applies to the analysis of more complex strings. 

A worth mentioning result concerning the quasispecies model is the ex- 
istence of a correspondence between the ordinary differential equations ([I]) 
and the equilibrium properties of a surface lattice systems 0. However, 
from an operational viewpoint, it seems easier to solve directly the ordinary 
differential equations than to use cumbersome statistical mechanics tools to 
obtain the surface equilibrium properties of the corresponding lattice system 
for finite v Actually, most of the statistical mechanics analyses of the 
quasispecies model are restricted to the limit v — > oo ]7|, §, [|. 

The remainder of this paper is organized as follows. In Sec. ^] we derive 
the equations governing the evolution of the monomer frequencies. To better 
appreciate the consequences of our simplifying assumptions, these equations 
are solved for several simple replication landscapes that have already been 
thoroughly analysed in the literature ||, ||. In Sec. ^| we discuss our results 
and present some concluding remarks. In particular, we point out how the 
model can be generalized so as to include sexual reproduction between the 
molecules. 

2 The model 

We now proceed with the derivation of the basic recursion relations for the 
monomer frequencies. The fraction of monomers of type a that a molecule 
characterized by P contributes to generation t + 1 is proportional to the prod- 
uct of three factors: (a) its frequency in the population n t (P) in generation 
t, (b) its replication rate A(P), and (c) the average number of monomers a 
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that replicate correctly, qP a , plus the average number of monomers (3 7^ a 
that due to replication errors mutate to a, (1 — — 1) J2p^aP/3- After 

some simple algebra it yields 

p a (t + l) = ^— (l- q + ^l A(P) P a ) , (5) 

k-1 y w t y j 

where the normalization factor w t is the average replication rate of the entire 
population in generation t, 

w t = vY l U t {P)A{P). (6) 
p 

Here the notation J2p stands for J2p 1= o ■ ■ ■ Ep K= o 8 ( v i S« Pa), where 8(k, I) 
is the Kronecker delta. To proceed further we must specify the replication 
rate A(P) of each molecule type, i.e., specify the replication landscape. 



2.1 Single sharp maximum 



In this case we ascribe replication rate a to the molecule composed of v 
monomers of type 1 and replication rate 1 to all the remaining molecules, i.e., 
A(P) = a if P = (1/, 0, . . . , 0) and A(P) = 1 otherwise. This is the simplest 
and probably the most studied replication landscape 0, because it clearly 
shows that although the so-called master string (is, 0, . . . , 0) has no match its 
chance of successfully taking over the population depends nontrivially on the 
values of the control parameters as well as on the initial monomer frequencies 
p a (0). Hence Eq. (f|) reduces to 
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K — 1 



1 — q + (kq — 1) 



and 



p a (t + 1) 



k - 1 



1 — q + (nq — I) 



Pl (t) + (a - 1) [ Pl (t)Y 
1 + (a - 1) [ Pl (t)r 

Pa(t) 



(7) 



l + (a-l)\p 1 (t)] 1 



a^l. (8) 



For simplicity, we keep the symmetry between monomers of type a 7^ 1 by 
setting their initial frequencies to p a (0) = (1 — Pi(0)) / (k — 1). Furthermore 
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we bias the initial population towards the master string by choosing pi(0) ~ 
1. 

In Fig. |T] we present the steady-state molecule frequencies, obtained by 
solving the recursion relation (|7]) for v = 10, a = 50 and function 
of the error rate per digit 1—q. Three distinct regimes can be identified. First, 
the direct replication regime (DR), that occurs for 1 — q < 1 — q t m 0.241, 
is characterized by a molecular population composed of a cloud of mutants 
around the master string, termed a quasispecies. In this regime there is a 
high proportion of type 1 monomers, i.e., the fixed point is p\ ~ 1. This fixed 
point disappears discontinuously at the error threshold 1 — q t . We note that, 
in contrast to the original quasispecies model, the error threshold transition 
is discontinuous. Second, the stochastic replication regime (SR), that occurs 
for 1 — q > 1 — q t , is characterized by the fixed point p\ ~ 1/2, which 
corresponds to an almost uniform distribution of monomer types. Third, 
the complementary replication regime (CR) that sets in when the replication 
error is so high (1 — q > 0.86) that the monomers are almost certain to 
mutate, so that the population oscilates between the quasispecies and its 
complement. This regime exists only for binary strings, since only in this 
case the complement of a string is unique. 

Some comments regarding the role of the initial monomer frequencies 
p Q (0) are in order. For 1 — q < 0.239 the high p\ fixed point is the only 
stable fixed point of (^). Above that value, a second stable fixed point, 
p* ps 1/2, appears. These fixed points compete, such that there is an all-or- 
none selection. The winner, however, is not determined by the replication 
rate only, but also by the initial monomer frequency p\(0). As mentioned 
above, the high p\ fixed point disappears at the error threshold transition. 
We will return to this issue in the analysis of the competition between two 
sharp maxima. 

The behavior pattern of the molecule frequencies for k > 2 is qualitatively 
similar to that discussed above: the DR and SR regimes are characterized 
by the fixed points p\ ~ 1 and p\ ~ l//c, respectively, while the CR regime 
is absent. 

In the following we will focus on the dependence of the error threshold 
1 — q t on the control parameters. The fixed points p\(t + 1) = pi(t) = pi of 
the recursion relation (|7|) are the roots of / (p) = where 

/ (p) = (1 - q) [ K p - 1 + (k - 1) (a - 1) p"] - (a - 1) (1 - p) p v . (9) 
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Numerical analysis of this function indicates that the discontinuous disap- 
pearance of the fixed point p\ ~ 1, that is the cause of the error threshold 
phenomenon in our model, coincides with the appearance of a double root of 
f{p). Hence, the error threshold 1 — q t can be easily determined by solving 
f{p) = and df(p)/dp = for p and q = q t simultaneously Eliminating the 
term p L of these two equations yields the following quadractic equation for 

Kup 2 - [1 + u + qn(u - 1)] p + qu = 0, (10) 

which possesses real roots either for 5K<lor?K> [(v+l)/(v - 1)] . Only 
the latter is relevant for the analysis of the error threshold, since this phe- 
nomenon occurs in the high replication accuracy region. The disappearance 
of the high pi fixed point is associated to the larger root of (0), while the 
smaller root is related to the appearance of the uniform fixed point p\ ~ 1/k. 
In order to avoid the error threshold discontinuous transition we must set the 
control parameters so as to violate the second inequality. In particular, for 
v and k fixed, the discontinuous transition line qt = <?t(a) terminates at the 
critical point 

1 (v + 1\ 2 , , 

K \V — 1/ 

ac=1+K {—l) («-l)(^-l) ' (12) 

which for large v become q c ~ 1/k and a c pa e~ 2 K u , respectively. We note 
that the critical point coordinates p c , q c and a c are obtained by solving the 
three equations f(p) = 0, df(p)/dp = and d 2 f(p)/dp 2 = simultaneously. 
The condition q c < 1 implies that there is a minimum string length, 

y K — 1 

below which the error catastrophe does not occcur. In Fig. |2| we present 
the phase diagram in the space (1 — q, a) for v — 10 and k — 2. The 
discontinuous transition between the phases DR and SR ends at the critical 
point 1 — q c = 0.251 and a c = 58.01, while the transition between the phases 
SR and CR seems to never disappear. Thus, for a given value of v it is always 
possible to choose a sufficiently large value of a so that the phases CR and 
SR are no longer distinguishable. To the best of our knowledge, there is no 
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similar result reported for the original quasispecies model. It must be noted, 
however, that due to the numerical difficulty of solving the set of k u ordinary 
differential equations (|TJ), the space of parameters has not been adequately 
explored for that model. In fact, the computational effort needed to study the 
evolution of a population of molecules composed of more than two types of 
monomers (k > 2) is so large that the important problem of the dependence 
of the error threshold 1 — q t on the number of monomer types k has remained 
unaddressed so far. In the population genetics framework, however, the 
number of recursion relations increases linearly with k, so this parameter 
does not introduce any particular difficulty to our analysis. Moreover, for 
the replication landscapes considered in this paper, in which the replication 
rates of the molecules are determined by one type of monomer only, the 
problem is reduced to the solution of a single recursion relation. In Fig. |3] 
we present the dependence of 1 — q t on k for a = 10 and several values of v. 
Note that beyond k ~ 4 the error threshold is almost insensitive to further 
increase of n. Hence, in order to maximize the information content of the 
quasispecies, it is advantageous to choose n as large as possible. Different 
values of a do not produce any qualitative change in this figure. 

2.2 Single smooth maximum 

In what follows we will consider the 2 only. We assume that the 

replication rates of the molecules increase with the number of monomers of 
type 1 they possess, irrespective of the other monomer types. More specifi- 
cally, 

A(P 1 ) = l + (a-l) (j^J (14) 

if Pi > // and A (Pi) = 1 otherwise. Here /i is an integer that can take 
on the values 0, 1, . . . , v — 1, and 7 is a real, positive variable. Clearly, fi 
measures the size of the flat region of the replication landscape, while the 
parameter 7 determines the smoothness of the landscape near /i: the larger 
7, the smoother the landscape. The same procedure employed in the analysis 
of the single-sharp maximum, which is recovered for [i = u — 1, can be used to 
investigate the error threshold transition for the smooth replication landscape 
(|T4T). In particular, in Fig. |3] we show the error rate per digit at the threshold 
transition, 1 — q t , as a function of the exponent 7 for v = 20, a = 10 and 
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several values of /i. For a given ji there is a critical value 7 C below which 
the error threshold phenomenon does not occur. This exponent is shown 
in Fig. |5| as a function of the ratio fi/u. It is clear from these figures that 
broad maxima (small /i) can resist longer to the error catastrophe or even 
avoid it depending on the value of the exponent 7. Large values of 7 actually 
increase the size of the flat region and so they favour the appearance of the 
error threshold. Our results are in agreement with a comment by Tarazona || 
that the exponent with which the replication landscape goes flat is germane 
to the onset of the error threshold transition. Different values of v and a do 
not change qualitatively these results. 

2.3 Two sharp maxima 

As before, we assume that A(P) depends on Pi only. In this case the repli- 
cation landscape consists of two sharp maxima A(P\ = 0) = Aq, A(P\ = 
v) = A u , and A (Pi) = 1 otherwise. In order to illustrate the role of the 
initial monomer frequencies we present in Figs. |6|, |^ and [8] the frequency 
of type 1 monomers as a function of the generation number t for v = 20, 
k = 2, A = 200, A 2 o = 10 and several initial frequencies. The evolution for 
1 — q = is shown in Fig. |6|. There are only two stable fixed points, namely, 
p* — 1 and pi = 0. Despite the large difference between the replication rates 
of the molecules associated to these fixed points, their basins of attraction 
are practically of the same size. They would be strictly equal if A = A 2 o- 
The main effect of a large replication rate in this case is to speed up the 
convergence to the low p\ fixed point. By increasing the error rate a new 
stable fixed point p\ ~ 1/2, associated to the stochastic replication regime, 
appears. The interplay of the three stable fixed points is shown in Fig. [7] for 
1 — q = 0.01. For nonzero replication error rates, the basin of attraction of the 
low pi fixed point is considerably larger than that of the high pi fixed point. 
Of course, their basins of attraction have actually decreased as compared 
with the case 1 — q — 0. We note that the two quasispecies do no coexist: for 
a given initial population there is an all-or-none selection. Finally, in Fig. |8] 
we present the evolution for 1 — q = 0.06. The high p\ fixed point, associated 
to the molecule with the smaller replication rate, has disappeared and the 
stochastic replication fixed point has taken over its basin of attraction. Fur- 
ther increase of the error rate 1 — q will eventually lead to the disappearance 
of the low pi fixed point too. 
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Within this framework we can easily study the competition between a 

The results show the 



sharp maximum and a broad or smooth maximum [10 



same qualitative features as those presented above. In particular, since the 
broader maximum possesses the larger error threshold 1 — q t (see Fig. f| ) it 
plays the same role as the larger replication rate maximum. We note that, in 
contrast to the original quasispecies model, there is no selection transition in 
our model which would amount to a discontinuous transition between 



the low and the high p\ fixed points, i.e., the former should take over the 
basin of attraction of the latter. 



3 Discussion 

An interesting extension of the quasispecies model is the possibility of two 
molecules exchanging matter during a collision. Clearly, the analogous to 
this phenomenon in the population genetics approach is sexual reproduc- 
tion. More specifically, the collision (mating) between the molecules (par- 
ents) (js{, . . . , sl^j and (s™, . . . , s™) produces the new molecules (offspring) 

(s{, . . . , s£_ l3 s™, . . . , s™) and ( sj 1 , . . . , s™_ 1; s£, . . . , s[) , where the digit < 
c < v is the so-called crossover point. The number of offspring depends, of 
course, on the replication rate of the parent molecules. Using the assump- 
tions presented in Sec. [I], it is straighforward to derive the following recursion 
relations for the evolution of the monomer frequencies: 



p a (t + l) 



k - 1 
where 



M/ " 1 E E MP f , P m )A(P f , P m )(Pl + Pa) 



2w t 

<■ pf pn 



(15) 



»t = ^EE n *( p/ . pm ) A (p f , p m ) (is) 

pf pm 

is the average replication rate of the entire population and 

Yl t {pf,P m ) = Ii t {pf)Yl t {P m ) (17) 

is the frequency of the collisions or matings between the molecules P* and 
P m . Here A(P*, P m ) determines the number of offspring generated by the 
mating between these two molecules. As expected since the positions of 
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the monomers inside the molecules play no role in our population genetics 
approach, the basic recursion relations are independent of the crossover 
point c. It is interesting to note that this equation reduces to eq. (|5|) in the 
case that A(P?, P m ) = A(Pf)A(P m ) and so the two reproduction modes, 
asexual and sexual, yield the same results. We have investigated the steady- 
state solutions of (|T5| ) under a variety of conditions, but found no noteworthy 
difference from the previously presented results. We only mention that by 
penalizing matings within a same class, i.e., A(Pf , P m ) = 1 if P* = P m we 
can obtain the formation of a quasispecies (a master string surrounded by a 
cloud of mutants) even in the regime of perfect replication accuracy q = 1. 

The critical, though natural, assumption of the population genetics ap- 
proach proposed in this paper is the use of the multinomial distribution (|4j) 
for the molecule frequencies. Since this is a single-peaked distribution, the 
coexistence of two or more quasispecies, which could only be described by a 
multi-peaked distribution, is prevented apriori. In the original quasispecies 
model such a coexistence is possible only in the case of degenerate quasis- 
pecies. This is an important issue, since it would be highly desirable to 
study the spontaneous formation of hypercycles within the framework of the 
quasispecies model [||, ?]. 

In this paper we have presented a population genetics formulation of the 
classic quasispecies model proposed by Eigen Q. Owing to its extreme sim- 
plicity, this formulation may be useful, in the sense of having the value of 
an approximation, to tackle problems for which the numerical difficulty of 
solving the ordinary differential equations (|l|) or employing the statistical 
mechanics approach makes the analysis prohibitive. Furthermore, even 
for the well-studied replication landscape that consists of a single sharp peak, 
our population genetics analysis has yielded some interesting and unexpected 
results such as the existence of a maximum peak height ( |T2D and a minimum 
string length (|13|) for the onset of the error catastrophe. It would be interest- 
ing to investigate whether similar bounds exist for the original quasispecies 
model. 
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Figure 1: Steady-state frequencies of molecules of type Pi — 10 (master 
string), 9, 8, 7, 6, 5, 4 and as a function of the error rate per digit 1 — q for 
v = 10, a = 50 and k, = 2. The error threshold transition occurs at 1 — q t ~ 
0.241 and the complementary replication regime sets in for 1 — q > 0.86. 



Figure 2: Phase diagram in the space (1 — q, a) for v = 10 and k = 2. The 
discontinuous transition between the phases DR and SR ends at the critical 
point 1 — q c = 0.251 and a c = 58.01. 
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Figure 3: Error threshold 1 — q t as a function of the number of monomer 
types k for a = 10 and (from top to bottom) v — 8, 10, 12, 14, 16, 18 and 
20. The lines are guides to the eyes. 



Figure 4: Error threshold 1 — qt as a function of the exponent 7 for v = 20, 
a = 10 and (from left to right) /jl/v — 0.75, 0.5, 0.25 and 0. The curves begin 
at 7 C = 7 c (/i). 



Figure 5: Critical value of the exponent 7 as a function of the ratio \ijv for 
v = 20 and a = 10. Below the curve the error threshold phenomenon does 
not occur. 



Figure 6: Frequency of monomers of type 1 as a function of the generation 
number for the two-sharp-maxima replication landscape and several initial 
frequencies Pi(0). The parameters are v = 20, k = 2, A = 200, A 2 o = 10 
and 1 — q = 0. 



Figure 7: Same as Fig. 6, but for 1 — q = 0.01. 



Figure 8: Same as Fig. 6, but for 1 — q = 0.06. 
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